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ABSTRACT 

Every culture and language is unique. Our work expressly 
focuses on the uniqueness of culture and language in re¬ 
lation to human affect, specifically sentiment and emotion 
semantics, and how they manifest in social multimedia. We 
develop sets of sentiment- and emotion-polarized visual con¬ 
cepts by adapting semantic structures called adjective-noun 
pairs, originally introduced by Borth et al. [^, but in a mul¬ 
tilingual context. We propose a new language-dependent 
method for automatic discovery of these adjective-noun con¬ 
structs. We show how this pipeline can be applied on a social 
multimedia platform for the creation of a large-scale multi¬ 
lingual visual sentiment concept ontology (MVSO). Unlike 
the flat structure in [^, our unified ontology is organized 
hierarchically by multilingual clusters of visually detectable 
nouns and subclusters of emotionally biased versions of these 
nouns. In addition, we present an image-based prediction 
task to show how generalizable language-specific models are 
in a multilingual context. A new, publicly available dataset 
of >15.6K sentiment-biased visual concepts across 12 lan¬ 
guages with language-specific detector banks, >7.36M im¬ 
ages and their metadata is also released. 

Categories and Subject Descriptors 

H.5.4 [Information Interfaces and Presentation]: Hy- 
pertext/Hypermedia; 1.2.10 [Artificial Intelligence]: Vi¬ 
sion and Scene Understanding 
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Arabic old market, English marche ancien, French 



good food, English Chinese lekker eten, Dutch 


Figure 1: Example images from “around the world” 
organized by affective visual concepts. Top set 
shows images of old market concept from three dif¬ 
ferent cultures/languages; and the bottom, images 
of good food. Even though conceptual reference is 
the same, each culture’s sentimental expression of 
these concept may be adversely different. 


1. INTRODUCTION 

If you scoured the world and took several people at ran¬ 
dom from major countries and asked them to fill in the blank 
“_ love” in their native tongue, how many unique ad¬ 

jectives would you expect to find? Would people from some 
cultures tend to fill it with twisted^ while others pure or 
unconditional or false? All over the world, we daily express 
our thoughts and feelings in culturally isolated contexts; and 
when we travel abroad, we know that to cross a physical 
border also means to cross into the unique behaviors and 
interactions of that people group - its cultural border. How 
similar or different are our sentiments and feelings from this 
other culture? Or the thoughts and objects we tend to talk 
about most? Motivated by questions like this, our work 
explores the computational understanding of human affect 
along cultural lines, with focus on visual content. In partic¬ 
ular, we seek to answer the following important questions: 
(1) how are images in various languages used to express af¬ 
fective visual concepts, e.g. beautiful place or delicious food? 
And (2) how are such affective visual concepts used to con¬ 
vey different emotions and sentiment across languages? 








Emotion keywords 

ecstasy, trance,... 
joy, delight,... 
fear, fright,... 
trust, confidence,... 


Tags and metadata of 
returned images 


Image tags 

ti: smiling/ADJ kids/NN 
t2i face/NN 

ts: southern/ADJ Cambodia/NP 
young/ADJ children/NN 
ts. hadA/BD fun/NN 

Part-of-speech labels 

ADJ: adjective 
NN: noun 
NP: Proper noun 
VBD: Vert) past tense 


ADJ-NN combinations 

smiling kids 
smiling face 
smiling children 
smiling fun 
young kids 
young face 
young children 
young fun 
southern kids 
southern face 
southern children 
southern fun 



ANP candidates 

smiling kids 
smiling face 
smiling children 

smiling fun (semantically incorrect) 

young kids 
young face 
young children 

young fun (semantically incorrect 
southern kids (neutral sentiment) 
southern face (neutral sentiment) 
southern children (neutral sentiment) 
southern fun (both) 


Figure 2: The construction process of our multilingual visual sentiment ontology (MVSO) begins with crawling 
images and metadata based on emotion keywords. Image tags (ti,..., ts) are labeled with part-of-speech tags, 


while 


and adjectives and nouns are used to form candidate adjective-noun pair (ANP) combinations 
others are ignored (in red). Finally, these candidate ANPs are filtered based on various criteria (S^ec. 3.2|) 
which help remove incorrect pairs (in red), forming a final MVSO with diversity and coverage. 
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In Psychology, there are two major schools-of-thought on 
the connection between cultural context and human affect, 
i.e. our experiential feelings via our sentiments and emo¬ 
tions. Some believe emotion to be culture-specific 29 , that 
is, emotion is dependent on one’s cultural context 
others believe emotion to be universal 


16 


while 
that is, emo¬ 
tion and culture are independent mechanisms. For example, 
while this paper is written in English, there are emotion 
words/phrases in other languages for which there is no ex¬ 
act translation in English, e.g.. Schadenfreude in German 
refers to pleasure at someone else’s expense. Do English- 
speakers not feel those same emotions or do they simply 
refer to them in a different way? Or even if the reference is 
the same, perhaps the underlying emotion is different? 

In Affective Computing and Multimedia, we often re¬ 
fer to the affective gap as the conceptual divide between the 
low-level visual stimuli, like images and features, and the 
high-level, abstracted semantics of human affect, e.g. happy 
or sad. In one attempt to bridge sentiment and visual me¬ 
dia, Borth et al. developed a visual sentiment ontology 
(VSO), a set of 1,200 mid-level concepts using structured 
semantics called adjective-noun pairs (ANPs). The noun 
portion of the ANP allows for computer vision detectabil¬ 
ity and the adjective serves to polarize the noun toward a 
positive or negative sentiment, or emotion, e.g. so instead 
of having visual concepts like sky or dog^ we have beautiful 
sky or scary dog. Many works like this that have built algo¬ 
rithms, models and datasets on the assumption of the psy¬ 
chology theory that emotions are universal. However, while 
such works provide great research contributions in that na¬ 
tive language, their applicability and generalization to other 
languages and cultures remains largely unexplored. 

We present a large-scale multilingual visual sentiment on¬ 
tology (MVSO) and dataset including adjective-noun pairs 
from 12 languages of diverse origins: Arabic, Chinese, Dutch, 
English, French, German, Italian, Persian, Polish, Russian, 
Spanish, and Turkish. We make the following contributions: 
( 1 ) a principled, context-aware pipeline for designing a mul¬ 
tilingual visual sentiment ontology, (2) a Multilingual Vi¬ 
sual Sentiment Ontology mined from social multimedia data 
end-to-end, (3) a MVSO organized hierarchically into noun- 


based clusters and sentiment-biased adjective-noun pair sub¬ 
clusters, (4) a multilingual, sentiment-driven visual concept 
detector bank, and (5) the release of a dataset contain¬ 
ing MVSO and large-scale image collection with benchmark 
cross-lingual sentiment predictioij^ 

2. RELATED WORK 

We address the general challenge of affective image un¬ 
derstanding^ aiming at both recognition and analysis of sen¬ 
timent and emotions in visual data, but from a multilingual 
and cross-cultural perspective. Our work is closely related to 
Multimedia and Vision research that focus on visual aesthet¬ 
ics [^, interestingness [^, popularity [^, and creativity 
[ 35 ] . Our work also relates to research in Gognitive and So¬ 
cia l Psychology, especially emotion and culture research 
34 , but also neuroaesthetics 
and social interaction [28]. 


41 , visual preference 


i^rogressive research in “visual affect” recognition was done 
in and where image features were designed based on 
art and psychology principles for emotion prediction. And 
such works were later improved in by adding social 
media data in semi-supervised frameworks. From this re¬ 
search effort in visual affect understanding, several affective 
image datasets were released to the public. The Interna¬ 
tional Affective Picture System (lAPS) dataset is a sem¬ 
inal dataset of ~ 1,000 images, focused on induced emotions 
in humans for biometric measurement. The Geneva Affec¬ 
tive PicturE Database (GAPED) consists of 730 pictures 
meant to supplement I APS and tries to narrow the themes 
across images. And recently, in [^, a visual sentiment ontol¬ 
ogy (VSO) and dataset was created from Flickr image data, 
resulting in a collection of adjective-noun pairs along with 
corresponding images, tags and sentiment. One major is¬ 
sue with these datasets and existing methods is that they 
do not consider the context in which emotions are felt and 
perceived. Instead, they assume that visual affect is uni¬ 
versal, and do not account for the inffuence of culture or 
language. We explicitly tackle visual affect understanding 
from a multi-cultural, multilingual perspective. In addition. 


^mvso.cs.Columbia.edu 






























English 

joy 

trust 

fear 

surprise 

sadness 

disgust 

anger 

anticipation 

Spanish 

alegrfa 

confianza 

miedo 

sorpresa 

tristeza 

asco 

ira 

prevision 

Italian 

gioia 

fiducia 

paura 

sorpresa 

tristezza 

disgusto 

rabbia 

anticipazione 

French 

bonheur 

confiance 

peur 

surprise 

tristesse 

degout 

colere 

prevision 

German 

Freude 

Vertrauen 

Angst 

Uberraschung 

Traurigkeit 

Emporung 

Arger 

Vorfreude 

Chinese 








mn 

Dutch 

vreugde 

vertrouwen 

angst 

verrassing 

verdriet 

walging 

woede 

anticipatie 


Table 1: Most representative keywords according to native/proficient speakers for eight basic emotions and 
for 7 of our 12 languages, chosen and shown top-to-bottom in decreasing no. of discovered visual affect 
concepts, or adjective-noun pairs. 


while existing works often use handpicked data, we gather 
our data “in the wild” on a popular, multilingual social mul¬ 
timedia platform. 

The study of emotions across language and culture has 
long been a topic of research in Psychology. A main con¬ 
tention in the area concerns whether emotions are culture- 
specific 29 , i.e. their perception and elicitation varies with 
the context, or universal [^. In |^, a survey of cross- 
cultural work on semantics surrounding emotion elicitation 
and perception is given, showing that there are still com¬ 
peting views as to whether emotion is pan-cultural, culture- 
specific, or some hybrid of both. Inspired by research in this 
domain, we are the first to investigate the relationship be¬ 
tween visual affect and cultur^El from a multimedia-driven 
and computational perspective, as far as we know. 

Other work in cross-lingual research comes from text sen¬ 
timent analysis and music information retrieval. In and 
[M] , they developed multilingual methods for international 
text sentiment analysis in online blogs and news articles, 
respectively. In and [^, they presented approaches 
to indexing digital music libraries with music from multi¬ 
ple languages. Specific to emotion, tried to highlight 
differences between languages by building models for pre¬ 
dicting the musical mood and then cross-predicting in other 
languages. Unlike these works, we propose a multimedia- 
driven approach for cross-cultural visual sentiment analysis 
in the context of online image collections. 

It is important to distinguish our work from that of Borth 
et al. on VSO and its associated detector bank, SentiBank 
1^. Their mid-level representation approach has recently 
proven effective in a wide range of applications in emotion 
prediction [4 21 , social media commenting [^, etc. How¬ 
ever, in addition to lack of multilingual support, there are 
several technical challenges with VSO that we seek to 

improve on via (1) detection of adjectives and nouns with 
language-specific part-of-speech taggers, as opposed to a 
fixed list of adjectives and nouns, (2) automatic discovery of 
adjective-noun pairs correlated with emotions, as opposed to 
“constructed” pairs from top frequent adjectives and nouns, 
and (3) stronger selection criterion based on image tag fre¬ 
quency, linguistic and semantic filters and crowdsource vali¬ 
dation.. Our proposed MVSO discovery method can be eas¬ 
ily extended to any language, while achieving greater cover¬ 
age and diversity than VSO. 


3. ONTOLOGY CONSTRUCTION 

An overview of the proposed method for multilingual vi¬ 
sual sentiment concept ontology construction is shown in 

^Note that we use language and culture interchangeably of¬ 
ten. We define language as the “lens” through which we can 
observe culture. So while the two can be distinguished, for 
simplicity, we use them interchangeably. 


Figure In the first stage, we obtain a set of images and 
their tags using seed emotion keyword queries, selected ac¬ 
cording to emotion ontologies from psychology such as 
or [^. Next, each image tag is labeled automatically by a 
language-specific part-of-speech tagger and adjective-noun 
combinations are discovered from words in the tags. Then, 
the combinations are filtered based on language, semantics, 
sentiment, frequency and diversity filters to ensure that the 
final set of ANPs have the following properties: (a) are writ¬ 
ten in the target language, (b) they do not refer to named en¬ 
tities or technical terms, (c) reflect a non-neutral sentiment, 
(d) are frequently used, and (e) are used by a non-trivial 
number of users of the target language. 

The discovery of affective visual concepts for these lan¬ 
guages using adjective-noun pairs poses several challenges 
in lexical, structural and semantic ambiguities, which are 
well-known problems in natural language processing. Lexi¬ 
cal ambiguity is when a word has multiple meanings which 
depend on the context, e.g. sport jaguar or forest jaguar. 
Structural ambiguity is when a word might have different 
grammatical interpretation depending on the position in the 
context, e.g. ambient light or light room. Semantic ambigu¬ 
ity is when a combination of words with the same syntactic 
structure have different semantic interpretation, e.g. big ap¬ 
ple. We selected languages in our MVSO according to the 
availability of public natural language processing tools and 
sentiment ontologies per language so that automatic pro¬ 
cessing was feasible. In addition, we sought to cover a wide 
range of geographic regions from the Americas to Europe 
and to Asia. We settled on 12 languages: Arabic, Chinese, 
Dutch, English, Erench, German, Italian, Persian, Polish, 
Russian, Spanish, and Turkish. 

We applied our proposed data collection pipeline to a pop¬ 
ular social multimedia sharing platform, Yahoo! Elickijj 
and collected public data from November 2014 to Eebruary 
2015 using the Elickr API. We selected Elickr because there 
is an existing body of multimedia research using it in the 
past, and in particular, describes how Elickr satisfies 
two conditions for making use of the “wisdom of the social 
multimedia”: popularity and availability. We do not repeat 
the argument in , but note that in addition to those ben¬ 
efits, Elickr has multilingual support and the use of Elickr 
facilitates a natural comparison to the seminal VSO work. 


3.1 Adjective-Noun Pair Discovery 

As our seed emotion ontology, we selected the Plutchik’s 
Wheel of Emotions [^. This psychology ontology was se¬ 
lected because it consists of graded intensities for multiple 
basic emotions providing a richer set of emotional valences 
compared to alternatives like 11 ; it has also been shown 


^www.flickr.com 

















T^images 

#tags 

9^cand 

9 ^anps (final) 

Arabic 

116,125 

958,435 

15,532 

29 

Chinese 

895,398 

3,919,161 

50,459 

504 

Dutch 

260,093 

4,929,581 

1,045,290 

348 

English 

1,082,760 

26,266,484 

2,073,839 

4,421 

French 

866,166 

22,713,978 

1,515,607 

2,349 

German 

528,454 

10,525,403 

854,100 

804 

Italian 

548,134 

10,425,139 

1,324,076 

3,349 

Persian 

128,546 

1,304,613 

103,609 

15 

Polish 

294,821 

5,261,940 

141,889 

70 

Russian 

60,108 

1,518,882 

30,593 

129 

Spanish 

827,396 

15,241,679 

925,975 

3,381 

Turkish 

332,609 

4,717,389 

73,797 

231 

#total 

5,940,610 

107,782,684 

8,154,766 

15,630 


Table 2: Ontology refinement statistics over 12 lan¬ 
guages. Beginning with many images from seed 
emotion keywords denoted by #images, we ex¬ 
tracted tags from these images #tags, and per¬ 
formed adjective-noun pair (ANP) discovery for 
candidate combinations #cand. Through a series of 
filters — frequency, language, semantics filter, senti¬ 
ment filter and diversity — and after crowdsourcing, 
we got our final visual sentiment concepts #anps. 


to be useful for VSO [^. The Plutchik emotions are or¬ 
ganized by eight basic emotions, each with three valences: 
ecstasy > joy > serenity; admiration > trust > acceptance; 
terror > fear > apprehension; amazement > surprise > dis¬ 
traction; grief > sadness > pensiveness; loathing > disgust 
> boredom; rage > anger > annoyance; and, vigilance > 
anticipation > interest. 

Multilingual Query Construction: To obtain seeds 
for each language, we recruited 12 native and proficient lan¬ 
guage speakers to provide a set of translated or synonymous 
keywords to those of the 24 Plutchik emotions. Speakers 
were allowed to use any number of keywords per emotion 
since the possible synonyms per emotion and language can 
vary, but they were asked to rank their chosen keywords 
along each emotion seed. They were also allowed to use tools 
like Google Translat^or other resources to enrich their emo¬ 
tion keywords. Table^ lists top ranked keywords according 
to speakers for 7 out of 12 languages in each emotion. 

Given the set of keywords = {efj \ i = 1... 24, j = 
1... n^} describing each emotion i per language /, where rii 
is the number of keywords per emotion z, we performed tag- 
based queries on tags with the Flickr API to retrieve images 
and their related tags. Like [^, for each emotion, we chose to 
sample only the top 50K images ranked by Flickr relevance 
to simply limit the size of our results, but if an emotion had 
less than 50K images, we extended the search to additional 
metadata, i.e. title and description. 

Part-of-speech Labeling: To identify the type of each 
word in a Flickr tag, we performed automatic part-of-speech 
labeling using pre-trained language-specific taggers which 
achieve high accuracy (>95% for most languages), namely 
TreeTagger |^, Stanford tagger [^, HunPos tagger 
and a morphological analyzer for Turkish [^. Though not 
all the tags contained multiple words, the average number 
of words was always greater than the average number of 
tags for all languages, so word context was almost always 
taken into account. From the full set of part-of-speech labels, 
we retained identified nouns, adjectives and other part-of- 


translate.google.com 


speech types which can be used as adjectives, such as simple 
or past participle (e.g. smiling face) in English. 

Discovery Strategy: We based our discovery strategy 
for ANPs on co-occurrence in image tags, that is, if an 
adjective-noun pair is relevant to the specific emotion it 
should appear at least once as that exact pair phrase in 
the crawled images for that emotion. To validate the com¬ 
pleteness of our strategy we compared with VSO and found 
that ~ 86 % of ANPs discovered by VSO overlap with the 
English ANPs discovered by our method. 


3.2 Filtering Candidate Adjective-Noun Pairs 

Erom these discovered ANPs, we applied several filters to 
ensure they satisfied the following criteria: (a) written in 
the target language, (b) do not refer to named entities, (c) 
reflect a non-neutral sentiment, (d) frequently used and (e) 
used by multiple speakers of the language. 

Language 8^ Semantics: We used a combination of 
language dictionarie^ instead of language classifiers to ver¬ 
ify the correctness of the ANP as the performance of using 
the latter was low for short-length text, especially for Ro¬ 
mance languages which share characters. All of the En¬ 
glish ANPs were classified as indeed English by the dic¬ 
tionary, while for other languages, ANPs were removed if 
they passed the English dictionary filter but not the tar¬ 
get language dictionary. The intuition for this was that 
most candidate ANPs in other languages were mixed mostly 
with English. We removed candidate pairs which referred 
to named entities or technical terms, where named entities 
were detected using several public knowledge bases such as 
Wikipedia and dictionaries for name^ cities, regions and 
countrie^ and technical terms were removed via a manually 
created list of words specific to our source domain, Elickr, 
containing photography-related (e.g. macro, exposure) and 
camera-related words (e.g. DSLR). 

Non-neutral Sentiment: To filter out neutral candi¬ 
date adjective-noun pairs, each ANP was scored in senti¬ 
ment using two publicly available sentiment ontologies: Sen- 
tiStrength 38 and SentiWordnet HIS]. SentiStrength ontol¬ 


ogy supported all the languages we considered, but since 
SentiWordnet could only be used directly for English, we 
passed in automatic translations in English from all other 
languages to it, following previous research o n m ultilingual 
sentiment analysis in machine translation [l 2 ]^ 


We com¬ 


puted the ANP sentiment score S{anp) G [—2, +2] as: 


S{anp) = 


S{a) 


S{a) -\- S(n) : otherwise 


sgn{S'(a)} 7 ^ sgn{S{n)} 


( 1 ) 


where S{a) G [—1,+1] and S{n) G [—1,+1] are the sen¬ 
timent scores of the individual adjective and noun words, 
respectively, each of which are given by the arithmetic mean 
of SentiStrength and SentiWordnet scores on the word, and 
sgn is the sign of the scores. The piecewise condition es¬ 
sentially says that if the signs of the sentiment scores of the 
adjective and noun differ, then we ignore the noun. This 

® WWW.winedt.org 

^WWW. wikipedia.org and www.ssa.gov, respectively 
^WWW .geobytes.com 

^Eor four non-English languages with the highest ANP 
counts, we have verified only a small percentage of non¬ 
neutral ANPs (less than 2 %) reverse sentiment polarity after 
translation, confirming similar observations in the previous 
work. 













highlights our belief that adjectives are the dominant sen¬ 
timent modifiers in an adjective-noun pair, so for example, 
even if a noun is positive, like wedding^ an adjective such as 
horrible would completely change the sentiment of the com¬ 
bined pair. And so, for these sign mismatch cases, we chose 
the adjective’s sentiment alone. In the other case, when the 
sign of the adjective and noun were the same, whether both 
positive (e.g. happy wedding) or both negative (e.g. seary 
spider)^ we simply allowed the ANP sentiment score to be 
the unweighted sum of its parts. ANP candidates with zero 
sentiment score were filtered out. 

Frequency: Good ANPs are those which are actually 
used together. Here, we loosely defined an ANP’s “fre¬ 
quency” of usage as its number of occurrences as an image 
tag on Flickr. When computing counts for each pair, we 
accounted for language-specific syntax like the ordering of 
adjectives and nouns. Following anthropology research p^ , 
we followed two dominant orderings (91.5% of the languages 
worldwide): adj-noun and noun-adj. We also “merged” sim¬ 
plified and traditional forms in Chinese by considering them 
to be from the same language pool but distinct characters 
sets. In addition, we considered the possible intermediate 
Chinese character during our frequency counting. For all 
non-English languages, we retained all ANPs that occurred 
at least once as an image tag; but for English, since Flickr’s 
most dominant number of users are English-speaking, we set 
a higher frequency threshold of 40. 

Diversity: The shear frequency of an adjective-noun pair 
occurrence alone was not sufficient to ensure a pair’s perva¬ 
sive use in a language. We also checked if the ANP was used 
by a non-trivial number of distinct Flickr users for a given 
language. We identified the number of users contributing to 
uploads of images for each ANP and found a power law dis¬ 
tribution in every language. To avoid this uploader bias, we 
removed all ANPs with less than three uploaders. Many re¬ 
moved candidate pairs came from companies and merchants 
for advertising and branding. 

To further ensure diversity in our MVSO, we subsampled 
nouns in every language by limiting to the 100 most frequent 
ANPs per adjective so that we do not have, for example, 
the adjective surprising modifying every possible noun in 
our corpus. In addition, we performed stem unification by 
checking and including only the inflected form (e.g. singu¬ 
lar/plural) of an ANP that was most popular in usage as 
a tag on Flickr. This unification did also filter some candi¬ 
date ANPs as some “duplicates” were present but simply in 
different inflected forms. 

3.3 Crowdsourcing Validation 

A further inspection of the corpus after the automatic fil¬ 
tering process showed that some issues could not completely 
be solved in an automatic fashion. Common errors included 
many fundamental natural language processing challenges 
like confusions in named entity recognition (e.g. big apple), 
language mixing (e.g. adjective in English + noun in Turk¬ 
ish), grammar inconsistency (e.g. adj-adj, or verb-noun) and 
semantic incongruity (e.g. happy happiness). So to refine our 
multilingual visual sentiment ontology, we crowdsourced a 
validation task. Eor each language, we asked native speak¬ 
ing workers to evaluate the correctness of ANPs post au¬ 
tomatic filtering. We collected judgements using Crowd- 
Eloweij^ a crowdsourcing platform that distributes small 

^WWW.crowdflower.com 



#cand 

#users 

#coun 

%correct 

%agree 

Arabic 

81 

10 

7 

0.57 

0.90 

Chinese 

1055 

56 

24 

0.63 

0.83 

Dutch 

1874 

45 

2 

0.23 

0.92 

English 

5369 

223 

52 

0.78 

0.84 

French 

5840 

152 

37 

0.43 

0.86 

German 

3360 

119 

27 

0.32 

0.90 

Italian 

4996 

216 

42 

0.57 

0.88 

Persian 

65 

6 

6 

0.37 

0.86 

Polish 

159 

6 

1 

0.52 

0.93 

Russian 

294 

13 

3 

0.70 

0.89 

Spanish 

4992 

190 

30 

0.70 

0.89 

Turkish 

701 

61 

22 

0.66 

0.84 


Table 3: Crowdsourcing results via no. of input 
candidate ANPs #cand, #users, countries #coun, 
and perc. of ANPs accepted %correct and annota¬ 
tor agreement %agree. 

tasks to a large number of workers, where we limited workers 
by their language expertise. We note that while we elected 
to perform this additional stage of crowdsourcing, other re¬ 
searchers may find a fully automatic pipeline more desirable, 
so in our public release, we also release the pre-crowdsourced 
version of our MVSO. 

3.3.1 Crowdsourcing Setup 

We required that each ANP was evaluated by at least 
three independent workers. To ensure high quality results, 
we also required workers to be (1) native speakers of the lan¬ 
guage, for which CrowdFlower had its own language com¬ 
petency and expertise test for workers, and (2) have a good 
reputation according to the crowdsourcing platform, mea¬ 
sured by workers’ performance on other annotation jobs. Eor 
whatever reasons, for three languages (Persian, Polish and 
Dutch), the CrowdElower platform does not evaluate work¬ 
ers based on their language expertise, so we filtered them 
by provenience, selecting the countries according to the offi¬ 
cial language spoken (e.g. Netherlands, Belgium, Aruba and 
Suriname for Dutch). 

Task Interface: The verification task for workers con¬ 
sisted of simply evaluating the correctness of adjective-noun 
pairs. At the top of each page, we gave a short summary 
of the job and tasked workers: “ Verify that a word pair in 
<Language> is a valid adjeetive-noun pair.^^ Workers were 
provided with a detailed definition of what an adjective- 
noun pair is and a summary of the criteria for evaluating 
ANPs, i.e. it (1) is grammatically correct (adjective + noun), 
(2) shows language consistency, (3) shows generality, that is, 
commonly used and does not refer to a named entity, and (4) 
is semantically logical. To guide workers, examples of correct 
and incorrect ANPs were provided for each criteria, where 
these ground truths were carefully judged and selected by 
four independent expert annotators. In the interface, aside 
from instructions, workers were shown five ANPs and simply 
chose between “yes” or “no” to validate ANPs. 

Quality Control: Like some other crowdsourcing plat¬ 
forms, CrowdElower provides a quality control mechanism 
called test questions to evaluate and track the performance 
of workers. These test questions come from pre-annotated 
ground truth, which in our case, correspond to ANPs with 
binary validation decisions for correctness. To access our 
task at all, workers were first required to correctly answer 
at least seven out of ten such test questions. In addition 









though, worker performance was tracked throughout the 
course of the task where these test questions were randomly 
inserted at certain points, disguised as normal units. For 
each language, we asked language experts to select ten cor¬ 
rect and ten incorrect adjective-noun pairs from each lan¬ 
guage corpus to serve as the test questions. 

3.3.2 Crowdsourcing Results 

To measure the quality of our crowdsourcing, we looked 
at the annotator agreement along each validation task. For 
all languages, the agreement was very strong with an av¬ 
erage annotator agreement of 87%, where workers agreed 
on either the correctness or incorrectness of ANPs. We 
found that workers tended to agree more that ANPs were 
correct than that they were incorrect. This was likely due 
to the wide range of possible criteria for rejecting an ANP 
where some criteria are easy to evaluate (e.g. language con¬ 
sistency), while others, such as general usage versus named 
entity, may cause disagreement among users due to the cul¬ 
tural background of the worker. For example, not all workers 
may agree that an ANP like big eyes or big apple refers to a 
named entity. However, for languages where the agreement 
on the incorrect ANPs was high, namely Arabic, German, 
and Polish, the average annotator agreement as a percentage 
of all ANP for that language were greater than 90%. 

On average, our crowdsourcing validated that a vast num¬ 
ber of the input candidate ANPs from our automatic ANP 
discovery and filtering process were indeed correct ANPs. 
English, Spanish and Russian were the top three for which 
the automatic pipeline performed the best, where every three 
in five ANPs were approved by the crowd judgements. How¬ 
ever, for certain languages, including German, Dutch, Per¬ 
sian and French, the number of ANPs rejected by the crowd 
was actually greater than accepted ANPs due to a higher 
occurrence of mixed language pairs, e.g. witzig humor. In 
Table we summarize statistics from our crowdsourcing ex¬ 
periments according to the number of ANPs, percentage of 
correct/incorrect ANPs by worker majority vote, and aver¬ 
age agreement. 

4. DATASET ANALYSIS & STATISTICS 

Having acquired a final set of adjective-noun pairs for each 
of the 12 languages, we downloaded images by querying the 
Flickr API with ANPs using a mix of tag and metadata 
search. To limit the size of our dataset, we downloaded no 
more than 1,000 images per ANP query and also enforced 
a limit of no more than 20 images from any given uploader 
on Flickr for increased visual diversity. The selected 1,000 
images were selected from the pool of retrieved image tag 
search results, but in the event that this pool is less than 
1,000, we also enlarged the pool to include searches on the 
image title and description, or metadata. Selections from 
the pool of results were always randomized and a small num¬ 
ber of images which Flickr or uploaders removed or changed 
privacy settings midway were removed. In total, we down¬ 
loaded 7,368,364 images across 15,630 ANPs for the 12 lan¬ 
guages, where English (4,049,507), Spanish (1,417,781) and 
Italian (845,664) contributed the most images. 

4.1 Comparison with VSO ||5| 

To verify and test the efficacy of our MVSO, we provide 
a comparison of our extracted English visual sentiment on¬ 
tology with that of VSO along dimensions of size (num- 




Figure 3: Comparison of our English MVSO and 
VSO in Figures (a), (b) and (c), in terms of ANP 
overlap, no. of ANPs, adjectives and nouns; and 
with all other languages in Figures (d), (e) and (f), 
in terms of the no. of ANPs, adjectives and nouns 
when varying the frequency threshold t from 0 to 
10,000 (on log-scale), respectively. 


ber of ANPs) and diversity of nouns and adjectives (Figure 
[^. In Figure]^, the overlap of English MVSO with VSO is 
compared with VSO alone after applying all filtering criteria 
except subsampling which might exclude ANPs belonging to 
VSO. As mentioned previously, about 86% overlaps between 
them. As we vary a frequency threshold t (as described in 
Sec. |3.2| ) over image tag counts, the overlap converges to 
100%. This confirms that the popular ANPs covered by 
VSO are also covered by MVSO, an interesting finding given 
the difference in the crawling time periods and approaches. 
In Eigure [^, we show that there are far greater number 
of ANPs in our English MVSO compared to VSO ANPs 
throughout all the possible values of frequency threshold, 
after applying all filtering criteria. Similarly, as shown in 
Eigure [^, given there are more adjectives and nouns in our 
English MVSO, we also achieve greater diversity than VSO. 

In Eigure [^, we compare the number of ANPs for the 
remaining languages in MVSO with VSO after applying all 
filtering criteria. The curves show that VSO has more ANPs 
than all the languages for most of the languages over all 
values of t, except from Spanish, Italian and Erench in the 
low values of t. Our intuition is that this is due to the 
popularity of English on Elickr compared to other languages. 
In Eiguresj^ and[^, we observe that these three languages 
have greater diversity of adjectives and nouns than VSO 
for t < 10^, German and Dutch have greater diversity than 






































VSO for smaller values of threshold t, while the rest of the 
languages have smaller diversity over most values of t. 

4.2 Sentiment Distributions 

Returning to our research motivation from the Introduc¬ 
tion, an interesting question to ask is which languages tend 
to be more positive or negative in their visual content. To 
answer this, we computed the median sentiment value across 
all ANPs of each language and ranked languages as in Fig. 
Here, to take into account the popularity difference among 
ANPs, we replicated each ANP k times, with k equal to 
the number of images tagged with the ANP, up to an up¬ 
per limit L = a X Avgi, where Avgi is the average image 
count per ANP in the zth language. Varying a value will 
result in different medians and distributions, but the trend 
in differentiating positive languages from negative ones was 
quite stable. We show the case when a = 3 in Fig. in¬ 
dicating that there is an overall tendency toward positive 
sentiment across all languages, where Spanish demonstrates 
the highest positive sentiment, followed by Italian. This 
surprising observation is in fact compatible with previous re¬ 
search showing that there is a universal positivity bias over 
languages with Spanish being the most relatively positive 
language |^. The languages with the lowest sentiment were 
Persian and Arabic, followed by English. 

The sentiment distributions (Fig. right) also showed 
interesting phenomena: Spanish being the most positive lan¬ 
guage also has the highest variation in sentiment, while Ger¬ 
man has the most concentrated sentiment distribution. Even 
for languages that have the lowest median sentiment values, 
the range of sentiment was concentrated in a small range 
near zero (between 0 and -0.1). 
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Figure 4: Median sentiment computed over all 
ANPs per language is shown on left, and the sen¬ 
timent distribution using box plots on the right 
(zoomed at 90% of the distributions). On right, lan¬ 
guages are sorted by median sentiment in ascending 
order (from the left). 


4.3 Emotion Distributions 

Another interesting question arises when considering co¬ 
occurrence of ANPs with the emotions in different languages. 
While our adjective-noun pair concepts were selected to be 
sentiment-biased, emotions still represent the root of our 
framework since we built MVSO out from seed emotion 
terms. So aside from sentiment, which focuses on only pos¬ 
itivity/negativity, what are probable mappings of ANPs to 
emotions for each language? What emotions are most fre¬ 
quently occurring across languages? Given the set of key¬ 
words = {ef'j \ i = 1.. .2A, j = 1... m} describing each 
emotion i per language I, where rii is the number of key¬ 
words per emotion z, the set of ANPs belonging to language 
/, noted as X G and the number of images tagged with 
both ANP X and emotion keyword | i = 
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Figure 5: Probabilities of emotions per language 
with respect to their visual sentiment content. Emo¬ 
tions are ordered by the sum of their probabilities 
across languages (left to right) and clipped for bet¬ 
ter visualization. Each row sums to 1. 


1... 24, j = 1.. .rii}, we define the probabilities of emotion 
for each ANP x in language I as: 


emo^(x) = 


rij ^j = l 

'^24 J_ (x) 

2-^i—l rii ^j = l ^ij 


e [ 0 , 1 ] 


(2) 


Note the model in does not take into account correlation 
among emotions, where for example, by an image tagged 
with “ecstasy,” users may also imply “joy” even though the 
latter is not explicitly tagged. These correlations can be 
easily accounted for by smoothing co-occurrence counts Cij 
over correlated emotions, e.g. the co-occurrence counts of an 
ANP tagged with “ecstasy” can be included partially in the 
co-occurrence count of “joy.” Regardless, still based on 
we compute a normalized emotion score per language I and 
emotion i as: 


score^(/) 


EL=? ' emo' (x) ■ couiit(a;) 

• count(x) 


Figu re!^ shows these scores per language and Plutchik emo¬ 
tion [34] on a heatmap diagram. Scores in each row sum to 
1 (over 24 emotions). The emotions are ordered by the sum 
of their scores across languages. The top-5 emotions across 
all languages are joy, serenity, interest, grief and fear. And 
the highest ranked emotion is joy in Russian, Ghinese and 
Arabic. Two other emotions in the top-5 were also posi¬ 
tive: serenity, being high ranked emotion for Dutch, Italian, 
Ghinese and Persian, and interest for English, Turkish and 
Dutch. The remaining two emotions in the top-5 were neg¬ 
ative: grief for Persian and Turkish, and fear, which was 
high ranked in German and Polish. We also observed that 
pensiveness was top ranked for Persian and Polish, vigilanee 
for French, rage for German, while apprehension and dis- 
traetion for Spanish. We note that these results are more 
concrete for languages with many ANPs (>1000) and less 
conclusive for those with few ANPs like Arabic and Persian. 


5. CROSS-LINGUAL MATCHING 

To get a gauge on the topics commonly mentioned across 
different cultures and languages, we analyzed alignments 
of translations for each ANP to English as a basis. Two 
approaches were taken to study this: exact and approxi¬ 
mate alignment. We ensured that translations of ANPs also 
passed all our validation filters described in Sec. |3.2| for this 
analysis. 
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STEP 1 : Noun based clustering STEP 2: ANP based sub clustering 



Figure 6: Percentage of times ANPs from one lan¬ 
guage (columns) were translated to [Left] the same 
phrase, or to [Right] the phrase in the same cluster 
as in another language (rows). 


Figure 7: Examples of noun clusters (left) and ANP 
sub-clusters (right) from our two-stage clustering for 
cros s-lingual matching. For visualization, wor d2ve c 
|32| vectors were projected to using t-SNE |4Q| . 


Exact Alignment: We grouped ANPs from each lan¬ 
guage that have the exact same translation. For exam¬ 
ple, old books was the translation for one or more ANPs 
from seven languages, including (Chinese), livres an- 

ciens (French), vecchi libri (Italian), Cxapbie KHnrn (Rus¬ 
sian), libros antiguos (Spanish), eski kitaplar (Turkish). The 
translation covered by the greatest number of languages was 
beautiful girl with ANPs from ten languages. Figure]^ (left) 
shows a correlation matrix of the number of times ANPs 
from pairs of languages appeared together in a set with the 
exact same translation, e.g. out of all the translations that 
German ANPs were translated to (782), more of them were 
translated to the same phrase with the ANPs used by Dutch 
speakers (39) than with the ANPs used by Chinese speak¬ 
ers (23). This was striking given that there were less (340) 
translation phrases from Dutch than from Chinese (473). 

Approximate Alignment: Translations can be inaccu¬ 
rate, especially when capturing underlying semantics where 
context is not provided. And so, we relaxed the strict con¬ 
dition for exact matches by approximately matching using 
a hierarchical two-stage clustering approach instead. First, 
we extracted nouns using TreeTagger from the list of 
translated phrases and discovered 3,099 total nouns. We 
then extracted word2vec features, a word representa¬ 
tion trained on a Google Newj^corpus, for these translated 
nouns (188 nouns were out-of-vocabulary), and performed k- 
means clustering (A:=200) to get groups of nouns with similar 
meaning. The number of clusters was picked based on the 
coherence of clusters; and we picked the number where the 
inertia value of the clustering started saturating while grad¬ 
ually increasing k. In the second stage of our hierarchical 
clustering, we split phrases from the translations into differ¬ 
ent groups based on the clusters their nouns belonged to. 
We extracted word2vec 32 features from the full translated 


phrase in each cluster and ran another round of A:-means 
clustering (adjusting k based on the number of phrases in 
each cluster, where phrases in each noun-cluster ranged from 
3 to 253). This two-stage clustering enables us to create a 
hierarchical organization of our ANPs across languages and 
form a multilingual ontology over visual sentiment concepts 
(MVSO), unlike the ffat structure in VSO [^. We discov¬ 
ered 3,329 sub-clusters of ANP concepts, e.g. resulting in 
clusters containing little pony and little horse as in Figure 
This approach also yielded a larger intersection between 
languages, where German and Dutch share 118 clusters, and 
German and Ghinese intersect over 101 ANP clusters. 
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news.google.com 


The correlation matrix from this approximate matching is 
shown in Figure along with one subtree from our ontology 
by hierarchical clustering in Figure For Figure we 
jected data to using t-SNE dimensionality reduction 
On the left, six clusters composed of different sets of nouns 
are shown with clusters of sunlight-rays-glow and dog-eat- 
pony. On the right, we show the sub-clustering of ANPs 
for the dog-eat-pony cluster in A, giving us noun groupings 
modified by sentiment-biasing adjectives to get ANPs like 
funny dog-funny eats and adopted dog-abondoned puppy. 



6. VISUAL SENTIMENT PREDICTION 

To test the effectiveness of a vision-based approach for 
visual affect understanding when crossing languages, we de¬ 
signed and built language-specific sentiment predictors using 
the data collected with MVSO. Inspired by work in 17 , we 


studied the extent to which the visual sentiments of a given 
language can be predicted by sentiment models of other lan¬ 
guages. We chose to focus on a sentiment prediction task, 
i.e. predicting whether an image is of positive or negative 
sentiment, because there is a large body of work expressly 
focused on sentiment (e.g. [^[^[^) for its simplicity, com¬ 
pared to emotion prediction. More importantly, we wanted 
to reduce the number of variables to be analyzed since our 
primary goal was to uncover cross-lingual differences. 

We first constructed a bank of visual concept detectors 
like in for our final MVSO adjective-noun pairs. For 
simplicity, we focused on the six languages with the most 
ANPs and associated images in our dataset: in decreasing 
order, English, Spanish, Italian, Erench, German and Ghi¬ 
nese. Gombined these six languages account for 94.7% of 
the ANPs in MVSO and 98.4% of the images in our dataset. 
However, to ensure that there were enough training images 
for each ANP, only the ANPs with no less than 125 im¬ 
ages were selected for model training and prediction. This 
reduced the combined ANP coverage to 63.5% but still en¬ 
sured 92.0% coverage for images. For each ANP, the images 
were split randomly 80/20% train/test, respectively. 


6.1 Visual Sentiment Concept Detectors 

To construct our bank of visual concept detectors of ANPs, 
we used convolutional neural networks (GNNs), in particu¬ 
lar, adopting an AlexNet-styled architecture for its good 
performance on large-scale vision recognition and detection 
tasks. To train our detector bank, we fined-tuned six models, 
one for each language, where network weights were initial- 

































Chinese 


(usedbookstore) 

ISK 

(old photo) 

(small home) 

*mm 

(ancient architecture) 
(important cultural) 



German 


miide kat/e 
(tired cat) 
kicincr hund 
(small dog) 
kurze pause 
(short break) 
bcstcr frcuiid 
(best friend) 
groBe welt 
(great world) 
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running group 
grand challenge 
last chance 


English 



French 


del oragcux 
(stormy sky) 
del mena^ant 
(threatening sky) 
del charge 
(charged sky) 
mauvais temps 
(bad weather) 
gros nuage 
(big cloud) 



Italian 


flora silvestre 
(wild flora) 
piccolo fiore 
(small flower) 
Fiori spontand 
(wild flowers) 
erba medica 
(medical grass) 
pieno sole 
(full sun) 



Spanish 


paisaje nevado 
(snowy landscape) 
camino nevado 
(snowy road) 
arboles nevados 
(snowy trees) 
arboles desnudos 
(bare trees) 
camino natural 
(natural path) 


Figure 8: Example top-5 classification results from 
our multilingual visual sentiment detector bank. 
Translations to English provided for convenience. 


ized with DeepSentiBank [^, an AlexNet model trained on 
the VSO dataset. This fine-tuning approach ensures that 
each network begins with weights that are already some¬ 
what “affectively” biased. The base learning rates were set 
to 0.001 and the number of output neurons in the last fully 
connected layer were set to the number of training ANPs 
of each language. Step sizes for reducing the learning rate 
in the second stage were set proportional to the number of 
training images per language. For a single language, fine- 
tuning took between 12 and 40 hours for convergence on a 
single NVIDIA GTX 980 GPU implemented with Gaffe [^. 
From Tableas expected we achieve higher top-1 and top-5 
accuracies than DeepSentiBank [^, even when the numbers 
of output neurons in English and Spanish are higher than 
those in [^. Top-A: accuracy refers to the percentage of clas¬ 
sifications that the true class is in the top k predicted ranks. 

6.2 Sentiment Prediction on Flickr 

We used the GNN-based visual concept models trained for 
each language to extract image features and use the senti¬ 
ment scores of ANPs as supervised labels to learn sentiment 
prediction models. We compared different layers of the GNN 
models as image features. To simplify the process, we bi- 



#ANPs 

T^train 

#test 

Irs (K) 

time (hr) 

top-1 

top-5 

English 

4,342 

3,236,728 

807,447 

50 

40 

10.1% 

21.7% 

Spanish 

2,382 

1,085,678 

270,400 

40 

35 

12.4% 

25.4% 

Italian 

1,561 

602,424 

149,901 

30 

30 

17.0% 

30.9% 

French 

1,115 

462,522 

115,112 

30 

26 

17.7% 

35.5% 

German 

275 

108,744 

27,048 

20 

12 

30.1% 

52.8% 

Chinese 

243 

102,740 

25,575 

20 

15 

27.1% 

45.0% 

DSB [el 

2,089 

826,806 

41,113 

- 

- 

8.2% 

19.1% 


Table 4: Adjective-noun pair (ANP) classification 
performance on Flickr images for six major lan¬ 
guages in MVSO and compared to DeepSentiBank 
(DSB) [^. No. of visual sentiment concepts #ANPs, 
Strain and #test images along with learning rate 
step size (Irs, in thousands) are shown with training 
times (in hours), top-1 and top-5 accuracies. 



Language-specific Sentiment Modei 


Figure 9: Image-based, cross-lingual domain trans¬ 
fer sentiment prediction results with language- 
specific models applied on cross-lingual examples. 


narized the ANP sentiment scores computed with Eq. Q, 
i.e. into positive and negative classes, and learned a binary 
classifier using linear SVMs, one for each language. The 
training images are those associated with ANPs of strong 
sentiment scores (absolute values higher than 0.05). Splits 
of training and test sets were stratified across all languages 
so that the amount of training and testing for positive and 
negative sentiment classes was the same for fair cross-lingual 
experiment comparison. 

We found that the softmax output features from the penul¬ 
timate layer outputs of each language’s GNN model per¬ 
formed the best for all languages, and we show resulting 
sentiment prediction results in Eigurej^ Each language ex¬ 
pectedly did better in predicting test samples from its own 
language, but in addition, Ghinese generally was the most 
difficult to predict by models trained from other languages; 
and using a sentiment model trained over Ghinese images to 
predict the sentiment in other languages was also the worst 
in average. We speculate that this is due to the difference 
in the visual sentiment portrayal from Eastern and Western 
cultures. Interestingly, the classification of French and Ital¬ 
ian sentiments was the most consistent using models from 
all languages. We also observed good performance in cross- 
lingual prediction for Latin languages, i.e. Spanish, Italian 
and French, where Italian was the best cross-lingual clas¬ 
sifier for Spanish and French sentiment, and Spanish was 
best for Italian sentiment, followed by French. Despite not 
performing as well as others in average, the English-specific 
sentiment model had the least variance in its accuracy across 
all languages, likely from the pervasiveness of English world¬ 
wide and across cultures. 

In Eigure[^ we show three classification example results 
from our cross-lingual sentiment prediction. On the left, 
an image from the Italian test set representing the costumi 
tradizionali concept was labeled as positive via sentiment 
scoring, but was predicted by the German model to be neg¬ 
ative; this may be due to differences in cultural perceptions 
of traditional clothing. In the center, the Ghinese model 
wrongly predicted an image from the English test set of 
foggy morning as positive, possibly for its resemblance to 
a Ghinese painting. And on the right, an image of a beau 
village from the Erench test set was successfully classified as 































































Original Language 
Adj-Noun Pair 

Prediction Model 
Truth/Predicted 


Italian 

costumi tradizionali 

(traditional costume) 

German 

positive/negative 


English 
foggy morning 

Chinese 

negative/positive 


French 
beau village 

(beautiful village) 

Spanish 

positive/positive 


Figure 10: Classification examples from cross- 
lingual sentiment prediction. The model from a 
source language is used to predict the sentiment of 
a target language image where the true label comes 
from the sentiment of the associated ANP. 


positive with the Spanish sentiment predictor. These exam¬ 
ples and preliminary experiments highlight some similarities 
and differences in how visual sentiment is expressed and per¬ 
ceived by various cultures. 
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7. CONCLUSION & FUTURE WORK 

We proposed a new multilingual discovery method for vi¬ 
sual sentiment concepts and showed its efficacy on a social 
multimedia platform for 12 languages. We based our ap¬ 
proach on the psychology theory that emotions are culture- 
specific and carry inherent linguistic context, and so we 
showed how to use language-specific part-of-speech label¬ 
ing along with progressive filtering to achieve coverage and 
diversity of visual affect concepts in multiple languages. In 
addition, we presented a two-stage hierarchical clustering 
approach to unify our ontology across languages. And we 
make our Multilingual Visual Sentiment Ontology (MVSO), 
pre-crowdsourcing as well as post, and image dataset, avail¬ 
able to the public. A cross-lingual analysis of our large-scale 
MVSO and image dataset using semantic matching and vi¬ 
sual sentiment prediction hint that emotions are not nec¬ 
essarily culturally universal. Our preliminary results show 
that there are indeed commonalities, but also distinct sep¬ 
arations, in how visual affect is expressed and perceived, 
where other works assumed only commonalities. We believe 
these point to the colorful diversity of our world, rather than 
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